Budget Feasible Mechanisms for Experimental 

Design 

Thibaut Horel Stratis loannidis 

Ecole Normale Superieure Technicolor 

S. Muthukrishnan 
Rutgers University, Microsoft Research 

February 27, 2013 

Abstract 

In the classical experimental design setting, an experimenter E has ac- 
cess to a population of n potential experiment subjects i G {1, . . . ,n}, each 
associated with a vector of features Xi £ M.''. Conducting an experiment 
with subject i reveals an unknown value j/i £ K to E. E typically assumes 
some hypothetical relationship between a^i's and yt's, e.g., yi « P'^Xi, 
and estimates /? from experiments. As a proxy for various practical con- 
straints, E may select only a subset of subjects on which to conduct the 
experiment. 

We initiate the study of budgeted mechanisms for experimental design. 
In this setting, E has a budget B. Each subject i declares an associated 
cost Ci > to be part of the experiment, and must be paid at least her 
cost. In particular, the Experimental Design Problem (EDP) is to find a 
set S of subjects for the experiment that maximizes V{S) = logdet(/d -|- 
"^i^g Xixf) under the constraint 5Djg5 Ci < B; our objective function 
corresponds to the information gain in parameter f5 that is learned through 
linear regression methods, and is related to the so-called D-optimality 
criterion. Further, the subjects are strategic and may lie about their costs. 
Thus, we need to design a mechanism for EDP with suitable properties. 

We present a deterministic, polynomial time, truthful, budget feasible 
mechanism for EDP. By applying previous work on budget feasible mech- 
anisms with submodular objective, one could only have derived either an 
exponential time deterministic mechanism or a randomized polynomial 
time mechanism. Our mechanism yields a constant factor 12.68) ap- 
proximation, and we show that no truthful, budget-feasible algorithms are 
possible within a factor 2 approximation. We also show how to generalize 
our approach to a wide class of learning problems. 
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1 Introduction 



In the classic setting of experimental design (22l . Q , an experimenter E has access 
to a population of n potential experiment subjects. Each subject i g {1, . . . , n} 
is associated with a set of parameters (or features) Xi G R'^, known to the 
experimenter. E wishes to measure a certain inherent property of the subjects 
by performing an experiment: the outcome iji of the experiment on a subject i 
is unknown to E before the experiment is performed. 

Typically, E has a hypothesis on the relationship between Xi^s and j/i's. Due 
to its simplicity, as well as its ubiquity in statistical analysis, a large body of 
work has focused on linear hypotheses: i.e., it is assumed that there exists a 
/3 eR"^ such that 

yi = p'^Xi + Si, 

for all z G {1, . . . , n}, where are zero- mean, i.i.d. random variables. Conduct- 
ing the experiments and obtaining the measurements yi lets E estimate /3, e.g., 
through linear regression. 

The above experimental design scenario has many applications. Regression 
over personal data collected through surveys or experimentation is the corner- 
stone of marketing research, as well as research in a variety of experimental 
sciences such as medicine and sociology. Crucially, statistical analysis of user 
data is also a widely spread practice among Internet companies, which routinely 
use machine learning techniques over vast records of user data to perform in- 
ference and classification tasks integral to their daily operations. Beyond linear 
regression, there is a rich literature about estimation procedures, as well as for 
means of quantifying the quality of the produced estimate [2^ . There is also 
an extensive theory on how to select subjects if E can conduct only a limited 
number of experiments, so the estimation process returns a /? that approximates 
the true parameter of the underlying population 15 , 17 , 3i Q • 



We depart from this classical setup by viewing experimental design in a 
strategic setting, and by studying budgeted mechanism design issues. In our 
setting, experiments cannot be manipulated and hence measurements are re- 
liable. E has a total budget of B to conduct all the experiments. There is a 
cost Ci associated with experimenting on subject i which varies from subject 
to subject. This cost Ci is determined by the subject i: it may be viewed as 
the cost i incurs when tested and for which she needs to be reimbursed; or, it 
might be viewed as the incentive for i to participate in the experiment; or, it 
might be the intrinsic worth of the data to the user. The economic aspect of 
paying subjects has always been inherent in experimental design: experimenters 
often work within strict budgets and design creative incentives. Subjects often 
negotiate better incentives or higher payments. However, we are not aware of 
a principled study of this setting from a strategic point of view, when subjects 
declare their costs and therefore determine their payment. Such a setting is 
increasingly realistic, given the growth of these experiments over the Internet 
and associated data markets. 

Our contributions are as follows. 
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• We initiate the study of experimental design problem in presence of a bud- 
get and strategic subjects. In particular, we formulate the Experimental 
Design Problem (EDP) as follows: the experimenter E wishes to find a set 
S of subjects to maximize 

V{S) = log det (/rf + (1) 

ies 

subject to a budget constraint J2ies — where B is E's budget. When 
subjects are strategic, the above problem can be naturally approached as 
a budget feasible mechanism design problem, as introduced by (23 |. 

The objective function, which is the key, is formally obtained by optimiz- 
ing the information gain in (3 when the latter is learned through ridge 
regression, and is related to the so-called D-optimality criterion [23, Q ■ 

• We present a polynomial time, truthful mechanism for EDP, yielding a 
constant factor (w 12.98) approximation to the optimal value of ([T]). In 
contrast to this, we show that no truthful, budget-feasible mechanisms are 
possible for EDP within a factor 2 approximation. 

We note that the objective ^ is submodular. Using this fact, apply- 
ing previous results on budget feasible mechanism design under general 
submodular objectives j2^ . Q would yield either a deterministic, truthful, 
constant-approximation mechanism that requires exponential time, or a 
non-deterministic, (universally) truthful, poly-time mechanism that yields 
a constant approximation ratio only in expectation [i.e., its approximation 
guarantee for a given instance may in fact be unbounded). 

From a technical perspective, we present a convex relaxation of ([1]), and show 
that it is within a constant factor from the so-called multi-linear relaxation of 
([1]), which in turn can be related to ([T|) through pipage rounding. We estab- 
lish the constant factor to the multi-linear relaxation by bounding the partial 
derivatives of these two functions; we achieve the latter by exploiting convex- 
ity properties of matrix functions over the convex cone of positive semidefinite 
matrices. 

In what follows, we describe related work in Section [51 We briefly review 
experimental design and budget feasible mechanisms in Section [3| and define 
EDP formally. In Section [3| we present our mechanism for EDP and state our 
main results, which are proved in Section [5] A generalization of our framework 
to machine learning tasks beyond linear regression is presented in Section [6l 



2 Related work 

2.1 Budget Feasible Mechanisms for General Submodular 
Functions 

Budget feasible mechanism design was originally proposed by [24|. Singer con- 
siders the problem of maximizing an arbitrary submodular function subject to 
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a budget constraint in the value query model, i.e. assuming an oracle pro- 
viding the value of the submodular objective on any given set. Singer shows 
that there exists a randomized, 11 2- approximation mechanism for submodular 
maximization that is universally truthful (i.e.., it is a randomized mechanism 
sampled from a distribution over truthful mechanisms). improve this result 
by providing a 7.91-approximate mechanism, and show a corresponding lower 
bound of 2 among universally truthful randomized mechanisms for submodular 
maximization. 

The above approximation guarantees hold for the expected value of the ran- 
domized mechanism: for a given instance, the approximation ratio provided by 
the mechanism may in fact be unbounded. No deterministic, truthful, constant 
approximation mechanism that runs in polynomial time is presently known for 
submodular maximization. However, assuming access to an oracle providing 
the optimum in the full- information setup, Chen et ai, propose a truthful, 
8.34-approximate mechanism; in cases for which the full information problem is 
NP-hard, as the one we consider here, this mechanism is not poly-timc, unless 
P=NP. Chen et al. also prove a 1 -|- lower bound for truthful deterministic 
mechanisms, improving upon an earlier bound of 2 by [23 |. 



2.2 Budget Feasible Mechanism Design on Specific Prob- 
lems 

Improved bounds, as well as deterministic polynomial mechanisms, are known 
for specific submodular objectives. For symmetric submodular functions, a 
truthful mechanism with approximation ratio 2 is known, and this ratio is tight 
(2^ . Singer also provides a 7.32-approximate truthful mechanism for the bud- 



get feasible version of Matching, and a corresponding lower bound of 2 [24 1. 
Improving an earlier result by Singer, Q give a truthful, 2 + v^-approximate 
mechanism for Knapsack, and a lower bound of 1 -I- \/2- Finally, a truthful, 
31-approximate mechanism is also known for the budgeted version of Cover- 
age [25|. Our results therefore add EDP to the set of problems for which a 
deterministic, polynomial time, constant approximation mechanism is known. 



2.3 Beyond Submodular Objectives 

Beyond submodular objectives, it is known that no truthful mechanism with 
approximation ratio smaller than exists for maximizing fractionally sub- 

additive functions (a class that includes submodular functions) assuming access 
to a value query oracle Assuming access to a stronger oracle (the de 



mand oracle), there exists a truthful, 0(log 7i)-approximate mechanism 11 j 
as well as a universally truthful, O ( i^^°f^^ „ )-approximate mechanism for sub- 
additive maximization Moreover, in a Bayesian setup, assuming a prior 
distribution among the agent's costs, there exists a truthful mechanism with 
a 768/512-approximation ratio [ajl. Posted price, rather than direct revelation 
mechanisms, are also studied in m. 
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2.4 Data Markets 

A series of recent papers 

0,1 EHj IstI . Q consider the related problem of retrieving 
data from an unverified database, where strategic users may misreport their data 
to ensure their privacy. We depart by assuming that experiment outcomes are 
tamper-proof and cannot be manipulated. A different set of papers 
consider a setting where data cannot be misreported, but the utility of users is 
a function of the differential privacy guarantee an analyst provides them. We 
do not focus on privacy; any privacy costs in our setup are internalized in the 
costs a. 

Our work is closest to the survey setup of 23|, who also consider how to 
sample individuals with different features who report a hidden value at a certain 
cost. The authors assume a joint distribution between costs Ci and features Xj, 
and wish to obtain an unbiased estimate of the expectation of the hidden value 
over the population, under the constraints of truthfulness, budget feasibility and 
individual rationality. Our work departs by learning a more general statistic 
(a linear model) rather than data means. We note that, as in [g^, costs Cj 
and features Xi can be arbitrarily correlated in our work — the experimenter's 
objective ([T]) does not depend on their joint distribution. 



3 Preliminaries 

3.1 Linear Regression and Experimental Design 

The theory of experimental design [2^ [^, Q considers the following formal set- 
ting. Suppose that an experimenter E wishes to conduct k among n possible 
experiments. Each experiment i e A/" = {1, ... ,77-} is associated with a set of 
parameters (or features) Xi G M**, normalized so that |jxi||2 < 1. Denote by 
5* C A/", where \S\ = k, the set of experiments selected; upon its execution, 
experiment i € S reveals an output variable (the "measurement") j/i, related to 
the experiment features Xi through a linear function, i.e., 

Wi e AA, - (3'^x, + e, (2) 

where /3 is a vector in R'', commonly referred to as the model, and (the 
measurement noise) are independent, normally distributed random variables 
with mean and variance . 

For example, each i may correspond to a human subject; the feature vector 
Xi may correspond to a normalized vector of her age, weight, gender, income, 
etc., and the measurement yi may capture some biometric information {e.g., her 
red cell blood count, a genetic marker, etc.). The magnitude of the coefficient 
j3i captures the effect that feature i has on the measured variable, and its sign 
captures whether the correlation is positive or negative. 

The purpose of these experiments is to allow E to estimate the model /3. In 
particular, assume that the experimenter E has a prior distribution on (3, i.e.^ 
j3 has a multivariate normal prior with zero mean and covariancc a^R^^ G 
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(where is the noise variance). Then, E estimates (3 through maximum a 
posteriori estimation: i.e., finding the parameter which maximizes the posterior 
distribution of /? given the observations yg. Under the hnearity assumption ^ 
and the Gaussian prior on /?, maximum a posteriori estimation leads to the 
following maximization fisj : 

/3 = argmaxPr(/3 | ys) = argmin (^(y, - x,f + (3) 



= {R + XlXs)-^X^ys (4) 

where Xg = [xi]i£s G RI'^I^'' is the matrix of experiment features and ys = 
[yi]ies € l^'"^' the observed measurements. This optimization, commonly known 
as ridge regression, includes an additional quadratic penalty term compared to 
the standard least squares estimation. Note that the estimator /3 is a linear map 
of ys', as ys is a multidimensional normal r.v., so is /3 (the randomness coming 
from the noise terms Si). 

Let V : 2-^ — > K be a value function, quantifying how informative a set of 
experiments S is in estimating /?. The classical experimental design problem 
amounts to finding a set S that maximizes V{S) subject to the constraint 15*1 < 



k. A variety of different value functions are used in literature [22|. A value 
function that has natural advantages is the information gain: 

V{S)=I{l3;ys) = H{l3)-H{(3\ys). (5) 

which is the entropy reduction on /3 after the revelation of ys (also known as the 
mutual information between ys and /3). Hence, selecting a set of experiments 
S that maximizes V{S) is equivalent to finding the set of experiments that 
minimizes the uncertainty on /?, as captured by the entropy reduction of its 
estimator. Under the linear model ([2]), and the Gaussian prior, the information 
gain takes the form: 

F(5) = ilogdet(i? + XjXs) (6) 

This value function is known in the experimental design literature as the D- 
optimality criterion 0, [^, 0] . 

Our analysis will focus on the case of a homotropic prior, in which the 
prior covariance is the identity matrix, i.e., R = Id € W^^'^. Intuitively, this 
corresponds to the simplest prior, in which no direction of is a priori favored; 
equivalently, it also corresponds to the case where ridge regression estimation 
^ performed by E has a penalty term \\P\W. A generahzation of our results to 
arbitrary covariance matrices R can be found in Section [6] 



3.2 Budget-Feasible Experimental Design: Full Informa- 
tion Case 

Beyond the cardinality constraint in classical experimental design discussed 
above, a budgeted version can also be considered. Each experiment is asso- 
ciated with a cost q G IR-(_. Moreover, the experimenter E is limited by a budget 
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B G M+. The cost Ci can capture, e.g., the amount the subject i deems sufficient 
to incentivize her participation in the experiment. In the full-information case, 
the experiment costs arc common knowledge; as such, the optimization problem 
that the experimenter wishes to solve is: 

ExperimentalDesignProblem (EDP) 

Maximize V{S) = logdct(/d + X^Xg) (7a) 

subject to ^ Ci < B (7b) 
ies 



EDP. as defined above, is NP-hard; to sec this, note that Knapsack reduces 
to EDP with dimension d = 1 by mapping the weight of each item, say, Wi, to 
an experiment with xf — Wi. 

Note that ^ is a submodular set function, i.e., V{S) + V{T) > V{SUT) + 
ViSDT) for all S,T Cj\f. It is also monotone, i.e., V{S) < vIt) for ah S CT, 
with y(0) = 0. Finally, it is non-negative, i.e., V{S) > for all S* C A/", since 
the matrix XjXs is positive semi-definite for all S C J\f. We denote by 

OPT = max {viS) | ^ < b] (8) 
ieS 

the optimal value achievable in the full-information case. 



3.3 Budget-Feasible Experimental Design: Strategic Case 

We study the strategic case, in wich the costs c; arc not common knowledge 
and their reporting can be manipulated by the experiment subjects. The latter 
are strategic and wish to maximize their utility, which is the difference of the 
payment they receive and their true cost. We note that, though subjects may 
misreport Ci, they cannot lie about Xi (i.e., all public features are verifiable prior 
to the experiment) nor jji {i.e., the subject cannot falsify her measurement). 

When the experiment subjects are strategic, the experimental design prob- 
lem becomes a special case of a budget feasible reverse auction, as introduced 
by [131 ■ Formally, given a budget B and a value function V : 2-^ — R_|_, a 
mechanism A4 = {S,p) comprises (a) an allocation functior^ : M'l 2^ and 
(b) a payment function p : M" — !■ M" . Given the vector of costs c = [ciJigTV, 
the allocation function S determines the set in Af of experiments to be pur- 
chased, while the payment function returns a vector of payments [pi{c)]iiz^. 
Let Si{c) = ligs(c) be the binary indicator of z 6 S{c). As usual, we seek 



mechanisms that have the following properties [24 



^Note that S would be more aptly termed as a selection function, as this is a reverse 
auction, but we retain the term "allocation" to align with the familiar term from standard 
auctions. 
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• Normalization. Unallocated experiments receive zero payments, i.e., 

Si{c) ~ implies Pi{c) = 0. (9) 

• Individual Rationality. Payments for allocated experiments exceed costs, 
i.e., 

P^ic)>C,■S,{c). (10) 

• No Positive Transfers. Payments arc non-negative, i.e., 

P.(c) > 0. (11) 

• Truthfulness/Incentive Compatibility. An experiment subject has no in- 
centive to missreport her true cost. Formally, let c_i be a vector of costs 
of all agents except i. Then, 

K(Ci,C_j) - Sj(Cj,C_j) • Cj > Pj(c^,C_i) - s(c';,C_i) • Ci, (12) 

for every i Cz M and every two cost vectors (ci,c_i) and (c';,c_i). 

• Budget Feasibility. The sum of the payments should not exceed the budget 
constraint, i.e. 

Y^p,{c)<B. (13) 

iGA/" 

Given that the full information problem EDP is NP-hard, we further seek mech- 
anisms that have the following two additional properties: 

• Approximation Ratio. The value of the allocated set should not be too far 
from the optimum value of the full information case ([5]). Formally, there 
must exist some a > 1 such that: 

OPT < aV{S). 

We stress here that the approximation ratio is defined with respect to the 
maximal value in the full information case. 

• Computational Efficiency. The allocation and payment function should 
be computable in time polynomial in various parameters. 

As noted in (23 . 01 , budget feasible reverse auctions are single parameter 
auctions: each agent has only one private value (namely, Ci). As such, Myerson's 
Theorem [2^ gives a characterization of truthful mechanisms. 

Lemma 1 f(20|). A normalized mechanism Ai = {S,p) for a single parameter 
auction is truthful iff: (a) f is monotone, i.e., for any agent i and c[ < Ci, for 
any fixed costs C-i of agents inAf\{i}, i € S{ci,c^i) implies i G S{c^,C-i), and 
(b) agents are paid threshold payments, i.e., for all i G S{c), Pi{c) = inf{c^ : 
i e S'(c-,c_i)}. 
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Myerson's Theorem allows us to focus on designing a monotone allocation 
function S. Then, the mechanism will be truthful as long as we give each agent 
her threshold payment — the caveat being that the latter need to sum to a value 
below B. 



4 Mechanism for EDP 

In this section we present our mechanism for EDP. Prior approaches to budget 
feasible mechanisms for sudmodular maximization build upon the full informa- 
tion case, which we discuss first. 

4.1 Submodular Maximization in the Pull Information Case 

In the full-information case, submodular maximization under a budget con- 
straint dS]) relies on a greedy heuristic in which elements are added to the solu- 
tion set according to the following greedy selection rule. Assume that S C J\f is 
the set constructed thus far, the next element i to be included is the one which 
maximizes the marginal-value-per-cost: 

I = arg max — ^ — (14) 

This is repeated until the sum of costs of elements in S reaches the budget 
constraint B. Denote by Sg the set constructed by this heuristic and let 
i* = argmaXjgjy- V({i}) be the element of maximum value. Then, the following 
algorithm: 

if y({r}) > V{Sg) return {i*} else return Sg (15) 



has a constant approximation ratio 24 1 



4.2 Submodular Maximization in the Strategic Case 



In the strategic case. Algorithm ([T5|) breaks incentive compatibility. Indeed, j25l | 
notes that this allocation function is not monotone, which implies by Myerson's 
theorem (Theorem [T]) that the resulting mechanism is not truthful. 

Let us denote by OPT-i- the optimal value achievable in the full-information 
case after removing i* from the set M: 

" 5c^u.n {^(^^ (16) 



24| and [8| prove that the following allocation: 

if y({r}) > C ■ OPT^^> return {i*} else return Sg 

yields a 8.34-approximation mechanism for an appropriate C (see also Algo- 
rithm[T]). However, OPT^i* , the maximum value attainable in the full- information 



9 



case, cannot be computed in poly-time when the underlying problem is NP-hard 
(unless P=NP), as is the case for EDP. 



Instead, [24[ and [8| suggest to replace OPT-i, by a quantity OPT'_^. sat- 



isfying the following properties: 

• OPT'_^, must not depend on agent i*'s reported cost and must be mono- 
tone: it can only increase when agents decrease their costs. This is to 
ensure the monotonicity of the allocation function. 

• OPT'_^, must be close to OPT-i* to maintain a bounded approximation 
ratio. 

• OPT'_^t must be computable in polynomial time. 

Such a quantity can be found by considering relaxations of the optimization 
problem p6)) . A function L : [0, 1]" — >■ R+ defined on the hypercube [0, 1]" is a 
fractional relaxation of V over the set M if L(ls) = V{S) for all S Af, where 
Is denotes the indicator vector of S. The optimization program ^ extends 
naturally to such relaxations: 



OPT' = arg max <j L(A) I V \,c, <b\ (17) 
Ae[o.i]" \ I 



One of the main technical contributions of |8| and [25[ is to come up with 
appropriate such relaxations for Knapsack and Coverage, respectively. 

4.3 Our Approach 

We introduce a relaxation L specifically tailored to the value function of EDP: 



V A G [0, 1]" , i(A) = log dot + 2^ \^x^xl , (18) 

V iGA/" / 

The function L is well-known to be concave and even self-concordant (see e.gr., 
[^). In this case, the analysis of Newton's method for self-concordant functions 
in shows that finding the maximum of L to any precision e can be done 
in 0(logloge~^) iterations. Being the solution to a maximization problem, 
OPT'_^, satisfies the required monotonicity property. 

The main challenge will be to prove that OPT'_^, , for our relaxation L, is 
close to OPT_i* . We show this by establishing that L is within a constant factor 
from the so-called multi-linear relaxation of (j7ap . which in turn can be related 
to ([7a|) through pipage rounding. We establish the constant factor to the multi- 
linear relaxation by bounding the partial derivatives of these two functions; we 
obtain the latter bound by exploiting convexity properties of matrix functions 
over the convex cone of positive semidefinite matrices. 

In summary, the resulting mechanism for EDP is composed of 

• the allocation function presented in Algorithm [1] and 
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Algorithm 1 Mechanism for EDP 



^ \ {^ e : c, > 5} 
i* ^ argmax^-g^ V^(j) 

OPT ,, ^ argmax,g[„,i]„{i(A) | A,. - 0,E.eAA\{,.} c,A, < B} 



if OPTL,, < C ■ V{i*) then 

return {i*} 
else 

I ^ argmaxi<^.<„ 

while c < B v(SgUW)-v(5g) (Jo 
wmie 2 y(SGU{i}) 

Sg^ SgU {^} 

I argmaxj-g^^s^ 

end while 
return S'g 
end if 



• the payment function which pays each allocated agent i her threshold 
payment as described in Myerson's Theorem. In the case where {i*} is 
the allocated set, her threshold payment is B (she would be have been 
dropped on line 1 of Algorithm [T] had she reported a higher cost). A 
closed-form formula for threshold payments when Sg is the allocated set 
can be found in [24| . 

We can now state our main result: 

Theorem 1. The allocation described in Algorithm]^ along with threshold pay- 
ments, is truthful, individually rational and budget feasible. Furthermore, there 
exists an absolute constant C such that, for any e > 0, the mechanism runs in 
time O (po/?/(n,d, logloge"^)) and returns a set S* such that: 



OPT < V^l^ll^^^^E^^y^sn + e . 12.98^(5*) + e 
2(e- 1) 

The value of the constant C is given by (|22p in Section 15.11 In addition, we 
prove the following simple lower bound. 

Theorem 2. There is no 2-approximate, truthful, budget feasible, individually 
rational mechanism for EDP. 



5 Proofs 

5.1 Proof of Theorem [1] 

We now present the proof of Theorem [TJ Truthfulness and individual ratio- 
nality follow from monotonicity and threshold payments. Monotonicity and 
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budget feasibility follow the same steps as the analysis of [8|; for the sake of 
completeness, we restate their proof in the Appendix. 

The complexity of the mechanism is given by the following lemma. 

Lemma 2 (Complexity). For any e > 0, the complexity of the mechanism is 
O {poly {n,d, log loge"!)). 

Proof. The value function V in (j7ap can be computed in time 0(poly(n, d)) 
and the mechanism only involves a linear number of queries to the function 
V. The function log dot is concave and sclf-concordant (see 0), so for any e, 
its maximum can be found to a precision e in 0(loglog£~^) of iterations of 
Newton's method. Each iteration can be done in time 0(poly(n, d)). Thus, line 
3 of Algorithm [1] can be computed in time 0(poly(7i, d, logloge"^)). Hence the 
allocation function's complexity is as stated. □ 

Finally, we prove the approximation ratio of the mechanism. We use the 
following lemma which establishes that OPT', the optimal value p7|) of the 
fractional relaxation L under the budget constraints is not too far from OPT. 

Lemma 3 (Approximation). OPT' < 20PT + 2ma.x,(z^fV{i). 

The proof of Lemma [3] is our main technical contribution, and can be found 
in Section [5^ 

We also use the following lemma from Q which bounds OPT in terms of 
the value of Sg, as computed in Algorithm [1] and i* , the clement of maximum 
value. 

Lemma 4 ([8|)- Let Sg be the set computed in Algorithm [I] and let i* = 
argmaXjgj,^ y({i}). We have: 



OPT < 



-{3V{SG) + 2Vii*)). 



Using Lemmas [3] and 0] we can complete the proof of Theorem [T] by showing 
that, for any e > 0, if OPT!_^, the optimal value of L when i* is excluded 
from J\f, has been computed to a precision e, then the set S* allocated by the 
mechanism is such that: 



OPT<i5£^i±^gH15±!v-,5.)«. (19) 

To see this, let OPT'_^, be the true maximum value of L subject to A.;* = 0, 
J2i£j\f\i* "^i — ^- Assume that on line 3 of Algorithm [1] a quantity L such that 
L — e < OPT'_i, < L + e has been computed (Lemma [5] states that this is 
computed in time within our complexity guarantee). 
If the condition on line 3 of the algorithm holds, then 

V{n > ^OPT'^,. - ^ > ioPT... - ^ 
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as L is a fractional relaxation of V. Also, OPT < OPT^i* + V{i*), hence, 

OPT <{l + C)V{i*) + e. (20) 

If the condition docs not hold, by observing that OPT'_^, < OPT' and 
applying Lemma |31 we get 

V{i*) < ^OPTL,, ^{20PT + 2V{z*)) + ^. 

Applying Lemma 01 



Thus, if C is such that C(e - 1) - 6e + 2 > 0, 



c 



C(e-l)-6e + 2 ^ ^' C(e-l)-6e + 2' 
Finally, using Lemma 2] again, we get 

OPT{VM,B) <^(l + , , ^ V{Sg) 



e-1 V C{e-l)-6e + 2j ' C(e - 1) - 6e + 2 ' 

(21) 

To minimize the coefhcients of Vi* and V{Sg) in ([20)1 and (PT|) respectively, we 
wish to chose C that minimizes 

, 1 ^ 3e / 4e 
max 1 + C, 7 1 



e - 1 V C(e- l)-6e + 2 

This function has two minima, only one of those is such that C(e— 1) — 6e+2 > 0. 
This minimum is 

„ _ 8e - 1 + \/64e2 - 24e + 9 

^ - 2(^^I^ • ^^^^ 

For this minimum, (g«(-g_^/)-6e+2 — ^' Placing the expression of C in ([20)1 and 
(|2T|) gives the approximation ratio in ([T9l) , and concludes the proof of Theorem[TJ 

□ 

5.2 Proof of Lemma [3] 

We need to prove that for our relaxation L given by (|18p , OPT' is close to OPT 
as stated in Lemma |31 Our analysis follows the pipage rounding framework of 

0- . 

This framework uses the multi-linear extension F of the submodular function 
V. Let P^{S) be the probability of choosing the set S if we select each element 
i in A/" independently with probability A,; : 

P^iS)^llX, J] (1-A,). 
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Then, the multi-linear extension F is defined by: 



SCAT 

For EDP the multi-hnear extension can be written: 



logdet(/d + ^a;,xf)j. (23) 



iGS 



Note that the relaxation L that we introduced in (fTS]). follows naturally from 
the multi-linear relaxation by swapping the expectation and the logdet in (|23p : 



L(A) = logdet Ec 



The proof proceeds as follows: 



. . ..,T 



• First, we prove that F admits the following rounding property: let A be a 
feasible element of [0, 1]", it is possible to trade one fractional component 
of A for another until one of them becomes integral, obtaining a new 
element A which is both feasible and for which F{X) > F{X). Here, by 
feasibility of a point A, we mean that it satisfies the budget constraint 
Sr=i ^i'^i — P- This rounding property is referred to in the literature as 
cross-convexity (see, e.g., [HI): or e-convexity by This is stated and 
proven in Lemma [5] and allows us to bound F in terms of OPT. 

• Next, we prove the central result of bounding L appropriately in terms of 
the multi-linear relaxation F (Lemma |6]). 

• Finally, we conclude the proof of Lemma [3] by combining Lemma [S] and 
Lemma [6l 

Lemma 5 (Rounding). For any feasible X e [0,1]" , there exists a feasible X € 
[0, 1]" such that at most one of its components is fractional and Fj\f(X) < Fj\f{X). 

Proof. We give a rounding procedure which, given a feasible A with at least 
two fractional components, returns some feasible A' with one less fractional 
component such that F{X) < F{X'). 

Applying this procedure recursively yields the lemma's result. Let us con- 
sider such a feasible A. Let i and j be two fractional components of A and let 
us define the following function: 

Fx{e) = F(Ae) where A. = A + £ ^e, - ^e- 

It is easy to see that if A is feasible, then: 



Ve G 



( - A„ (Aj - 1)^] , min (l - A^, A^^) 1 , A^ is feasible (24) 
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Furthermore, the function Fx is convex; indeed: 

Fxie) = Es,^p^^^^^^^^sn [(A. + e)(A, - e^)v{S' U 

+ (A, + e) (l - A, + £-) V{S' U {t}) + {l-K-e) (x, - e-) V{S' U {j}) 

\ Cj J 1 

Thus, -Fx is a degree 2 polynomial whose dominant coefficient is: 

Ci 



^(5' U {*}) + ^(5' U {*}) - 1^(5' U j}) - V{S') 



which is positive by submodularity of V . Hence, the maximum of F\ over the 
interval given in (|24p is attained at one of its limit, at which cither the i-th or 
j-th component of A^ becomes integral. □ 

Lemma 6. For all A G [0, 1]", \ L(A) < F{\) < L(A). 

Proof. The bound FV(A) < Lj^fi^X) follows by the concavity of the logdet func- 
tion. To show the lower bound, we first prove that | is a lower bound of the 
ratio diF{X)/diL{X), where di- denotes the partial derivative with respect to 
the i-th variable. 

Let us start by computing the derivatives of F and L with respect to the 
z-th component. Observe that 



Hence, 



SCAT SCN' 

les ieN'\s 

Now, using that every S such that i Cz S can be uniquely written as S" U {i}, 
we can write: 

a.F(A)= P^\i^}iS){V{SU{z})-ViS)). 

SCAT 

ieAr\s 

The marginal contribution of i to S* can be written as 
V{S U {i}) - ViS) = i logdet(/d + XjXs + x,xf) - ^ logdet(/d + XjXs) 
= i logdct(/d + x.xfild + X^Xs)-') 
= ^logil+xfAiSr'x,) 



15 



where A{S) = Id + XjXs, and the last equality follows from the Sylvester's 
determinant identity [3- Using this, 

SC7V 

The computation of the derivative of L uses standard matrix calculus: writing 
i(A) = /d + EjGA^^»2;,xf, 

det A{\ + h- e^) = dct (i(A) + hx^xj) = deti(A)(l + hxjA{\y^x^). 
Hence, 

logdet A(A + h- Ci) = logdetA(A) + hxJAiXy^Xi + o(h), 
which implies 

a,L(A) = ^xfA{X)-^x,. 

For two symmetric matrices A and B, we write A > B [A > B) ii A — B 
is positive definite (positive semi-definite). This order allows us to define the 
notion of a decreasing as well as convex matrix function, similarly to their real 
counterparts. With this definition, matrix inversion is decreasing and convex 
over symmetric positive definite matrices (see Example 3.48 p. 110 in fil). In 
particular, 

V5CAA, A{S)-^ h A{S \J 

as A{S) < A{S U {i}). Observe that P^\{i}{S) > P^\ii}{S U {i}) for ah 
SCN\ {i}, and > P^(S'), for aU SCJV. Hence, 

^^F{X)>l P^\i^}iS)\og[^ + xfA{S)-'x?j 

SCJ\f 

+ 7 E P^\i^}iSU{^})log{l + xJA{SU{^})-'x?j 

SCAT 

SCAT 

Using that A{S) >z Id we get that x'[A{S)~^x^ < \\xi\\l < 1. Moreover, log(l + 
x) > X for all X < 1. Hence, 

5.F(A) > W( ^ P^{S)A{S)-')x,. 
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Finally, using that the inverse is a matrix convex function over symmetric pos- 
itive definite matrices: 

1 \-i 1 . 1 



^ SCAA 



Having bound the ratio between the partial derivatives, we now bound the 
ratio F{X)/L{X) from below. Consider the following cases. First, if the minimum 
of the ratio F{X)/L{X) is attained at a point interior to the hypercube, then it 
is a critical point, i.e., i9i(F(A)/L(A)) = for all i G Af; hence, at such a critical 
point: 

m ^ d,F{X) 1 

L(A) a,L(A) - 2- 
Second, if the minimum is attained as A converges to zero in, e.g., the I2 norm, 
by the Taylor approximation, one can write: 

m E.eA^A.9.L(0) - 2' 

i.e., the ratio x(iy necessarily bounded from below by 1/2 for small enough 
A. Finally, if the minimum is attained on a face of the hypercube [0, 1]" (a face 
is defined as a subset of the hypercube where one of the variable is fixed to 
or 1), without loss of generality, we can assume that the minimum is attained 
on the face where the n-th variable has been fixed to or 1. Then, either the 
minimum is attained at a point interior to the face or on a boundary of the face. 
In the first sub-case, relation (|25|) still characterizes the minimum for i < n. In 
the second sub-case, by repeating the argument again by induction, we see that 
all is left to do is to show that the bound holds for the vertices of the cube (the 
faces of dimension 1). The vertices are exactly the binary points, for which we 
know that both relaxations are equal to the value function V. Hence, the ratio 
is equal to 1 on the vertices. □ 

To conclude the proof of Lemma[3l let us consider a feasible point A* G [0, 1]" 
such that L{X*) = OPT' . By applying Lemma [5] and Lemma [5] wc get a feasible 
point A with at most one fractional component such that 

L{X*) < 2F{X). (26) 

Let Xi denote the fractional component of A and S denote the set whose indicator 
vector is A — XiCi. By definition of the multi-linear extension F: 

F{X) = {l-X^dV{S) + X,V{SU{i}). 

By submodularity of V, V{S U {i}) < V{S) + V{{i}). Hence, 

Fix) <ViS) + Vii). 

Note that since A is feasible, S is also feasible and V{S) < OPT. Hence, 

F{X) < OPT + ma.xV{i). (27) 

Together, ([25]) and ([?7|) imply the lemma. □ 
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5.3 Proof of Theorem H 



Suppose, for contradiction, that such a mechanism exists. Consider two exper- 
iments with dimension d = 2, such that Xi = ei = [1, 0], X2 = €2 = [0, 1] and 
Ci ^ C2 = B/2 + e. Then, one of the two experiments, say, Xi, must be in the set 
selected by the mechanism, otherwise the ratio is unbounded, a contradiction. 
If xi lowers its value to B/2 — e, by monotonicity it remains in the solution; 
by threshold payment, it is paid at least B/2 + e. So X2 is not included in the 
solution by budget feasibility and individual rationality: hence, the selected set 
attains a value log 2, while the optimal value is 2 log 2. □ 

6 Extensions 

6.1 Strategic Experimental Design with Non-Homotropic 
Prior 

In the general case where the prior distribution of the experimenter on the 
model /3 in ^ is not homotropic and has a generic covariance matrix R, the 
value function takes the general form given by ©. 

Applying the mechanism described in Algorithm [T] and adapting the analysis 
of the approximation ratio mutatis mutandis, we get the following result which 
extends Theorem [TJ 

Theorem 3. There exists a truthful, individually rational and budget feasible 
mechanism for the objective function V given by Furthermore, for any 

e > 0, in time 0{poly{n, d,\og\og £~^)) , the algorithm computes a set S* such 
that: 

OPT < V{S*) + 5.1 + e 

- e - 1 log(l + 

where /i is the smallest eigenvalue of R. 

6.2 Non-Bayesian Setting 

In the non-bayesian setting, i.e. when the experimenter has no prior distribution 
on the model, the covariance matrix R is the zero matrix. In this case, the ridge 
regression estimation procedure ^ reduces to simple least squares (i.e., linear 
regression), and the Z3-optimality criterion reduces to the entropy of j3, given 
by: 

V{S) = logdet(XjXs) (28) 

A natural question which arises is whether it is possible to design a determin- 
istic mechanism in this setting. Since ([^5)) may take arbitrarily small negative 
values, to define a meaningful approximation one would consider the (equiva- 
lent) maximization of ^(5*) = deiX^Xs. However, the following lower bound 
implies that such an optimization goal cannot be attained under the constraints 
of truthfulness, budget feasibility, and individual rationality. 
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Lemma 7. For any M > 1, there is no M -approximate, truthful, budget fea- 
sible, individually rational mechanism for a budget feasible reverse auction with 
value function V{S) = detXgXs- 

Proof. Given M > 1, consider n ~ A experiments of dimension d ~ 2. For 
61,62 the standard basis vectors in R^, let xi — 6i, X2 = ei, and x^ = Sei, 
X4 = (562, where < ^ < 1/(M — 1). Moreover, assume that ci = C2 = 0.5 + e, 
while 63 = 64 = e, for some small e > 0. Suppose, for the sake of contradiction, 
that there exists a mechanism with approximation ratio M. Then, it must 
include in the solution S at least one of xi or X2- if not, then V{S) < 5'^, while 
OPT = (1 + S)S, a contradiction. Suppose thus that the solution contains xi. 
By the monotonicity property, if the cost of experiment xi reduces to B/2 — 3e, 
xi will still be in the solution. By threshold payments, experiment xi receives 
in this case a payment that is at least B/2 + e. By individual rationality and 
budget feasibility, X2 cannot be included in the solution, so V{S) is at most 
(1 + 6)6. However, the optimal solution includes all experiments, and yields 
OPT = (1 + (5)2, a contradiction. □ 

6.3 Beyond Linear Models 

Selecting experiments that maximize the information gain in the Bayesian setup 
leads to a natural generalization to other learning examples beyond linear regres- 
sion. Inparticular, consider the following variant of the standard PAC learning 
setup [26|: assume that the features Xi, i £ M take values in some generic set 
fl, called the query space. Measurements yi gM. are given by 



where h <E% ior some subset % of all possible mappings /i : f2 ^ K, called the 
hypothesis space. As before, we assume that the experimenter has a prior dis- 
tribution on the hypothesis h £ Ji; we also assume that Si are random variables 
in M, not necessarily identically distributed, that are independent conditioned 
on h. As before, the features Xi are public, and the goal of the experimenter is 
to (a) retrieve measurements yi and (b) estimate h as accurately as possible. 

This model is quite broad, and captures many classic machine learning tasks; 
we give a few concrete examples below: 

1. Generalized Linear Regression. In this case, fi = R'', H is the set of 

linear maps {h{x) = 0^ x s.t. /3 € W^}, and Si are independent zero-mean 
variables (not necessarily identically distributed). 

2. Learning Binary Functions with Bernoulli Noise. When learning 
a binary function under noise, the experimenter wishes to determine a 
binary function h by testing its output on differrent inputs; however, the 
output may be corrupted with probability p. Formally, = {0, 1}'', % is 
some subset of binary functions /i : f2 — > {0, 1}, and 



yi = h{xi) -t- Ei 



(29) 
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3. Logistic Regression. Logistic regression aims to learn a hyperplane 
separating +l-labeled values from — 1-labeled values; again, values can 
be corrupted, and the probability that a label is flipped drops with the 
distance from the hyperplane. Formally, £7 = R'', "H is the set of maps 
{h{x) = sign{j3^x) for some (3 G R"^}, and are independent conditioned 
on /9 such that 

-2 • l/jr^>o, w. prob.— 



i/3r^>0, w. piou.— ^ 



-2 • l/3T^<o, w. prob. 



We can again define the information gain as an objective to maximize: 

V{S)^H{h)-H{h\ys), SCAT (30) 

This is a monotone set function, and it clearly satisfies V{^) = 0. In general, 
the information gain is not a submodular function. However, when the errors 
Ci arc independent conditioned on h, the following lemma holds: 

Lemma 8. The value function given by the information gain pOp is submodu- 
lar. 

Proof. A more general statement for graphical models is shown in [l6j; in short, 
using the chain rule for the conditional entropy we get: 

V{S) = H{ys) - H{ys \ h) = H{ys) - ^ H{y, \ h) (31) 

iGS 

where the second equality comes from the independence of the yiS conditioned 
on h. Recall that the joint entropy of a set of random variables is a submodular 
function. Thus, the value function is written in (|3ip as the sum of a submodular 
function and a modular function. □ 

This lemma implies that learning an arbitrary hypothesis, under an arbitrary 
prior when noise is conditionally independent leads to a submodular value func- 



tion. Hence, we can apply the previously known results by [24| and [8| to get 
the following corollary: 

Corollary 1. For Bayesian experimental design with an objective given by the 
information gain (j30p . there exists a randomized, polynomial-time, budget fea- 
sible, individually rational, and universally truthful mechanism with a 7.91 ap- 
proximation ratio, in expectation. 

In cases where maximizing ()30p can be done in polynomial time in the full- 
information setup, there exists a deterministic, polynomial-time, budget feasible, 
individually rational, and truthful mechanism for Bayesian experimental design 
with an 8.34 approximation ratio. 

Note however that, in many scenarios covered by this model (including the 
last two examples above) , even computing the entropy under a given set might 
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be a hard task — i.e., the value query model may not apply. Hence, identify- 
ing learning tasks in the above class for which truthful or universally truthful 
constant approximation mechanisms exist, or studying these problems in the 
context of stronger query models such as the demand model [111 |5| remains an 
interesting open question. 
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Appendix 



Lemma 9. Our mechanism for EDP is monotone and budget feasible. 

Proof. Consider an agent i with cost c,; that is selected by the mechanism, and 
suppose that she reports a cost c- < Cj while all other costs stay the same. 
Suppose that when i reports Ci, L{^) > CV{i*); then, as Si{ci, c-i) = 1, i G Sq- 
By reporting a cost < i may be selected at an earlier iteration of the 
greedy algorithm. Denote by Si (resp. S'^) the set to which i is added when 
reporting cost Ci (resp. c-). We have S- C Sf, in addition, S'^ C S'q, the set 
selected by the greedy algorithm under (c-,c_i); if not, then greedy selection 
would terminate prior to selecting i also when she reports Ci, a contradiction. 
Moreover, we have 

^, ^ B V{S., U {»}) ~ VjS^,) ^ B V{Sl U {*}) - V{Si) 



ViS.,U{t}) - 2 ViSlU{i}) 

by the monotonicity and submodularity of V. Hence i G Sq. As L{^), is 
the optimal value of (jl7p under relaxation L when i* is excluded from Af, 
reducing the costs can only increase this value, so under < the greedy 
set is still allocated and Si(c[,c^i) = 1. Suppose now that when i reports q, 
L(^) < CV{i*). Then Si(cj,c_i) = 1 iff i = i*. Reporting c'^, < a- does not 
change V{i*) nor L(^) < CV{i*); thus s^. (c'j. , c_i. ) = 1, so the mechanism is 
monotone. 

To show budget feasibility, suppose that L(^) < CV{i*). Then the mech- 
anism selects i* . Since the bid of i* does not affect the above condition, the 
threshold payment of «* is B and the mechanism is budget feasible. Suppose 
that L(^) > CV{i*). Denote by Sg the set selected by the greedy algorithm, 
and for i € Sq, denote by Si the subset of the solution set that was selected 
by the greedy algorithm just prior to the addition of i — both sets determined 
for the present cost vector c. Then for any submodular function V, and for all 

i,.:>Mi)_Z(£)Ba,c.,w,c_.).o (32) 

y[SG) 

In other words, if i increases her cost to a value higher than ^('S'iUW)-^('S') 
she will cease to be in the selected set Sq- As a result, p2p implies that the 
threshold payment of user i is bounded by the above quantity. Hence, the total 
payment is bounded by the telescopic sum: 



E 



V{S.. U {*}) - V{S.i) ^ ^ ViSG)-Vi9) ^ ^ ^ ^ 



ViSa) ViSa) 



□ 
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